Skip to content

hwmon: disambiguate colliding chip labels#3646

Open
mwimpelberg28 wants to merge 4 commits into
prometheus:masterfrom
mwimpelberg28:mwimpelberg/hwmon-dedup-chip-labels
Open

hwmon: disambiguate colliding chip labels#3646
mwimpelberg28 wants to merge 4 commits into
prometheus:masterfrom
mwimpelberg28:mwimpelberg/hwmon-dedup-chip-labels

Conversation

@mwimpelberg28
Copy link
Copy Markdown

Summary

Fixes #3637.

Multiple hwmon nodes can be registered under a single parent device — for example, asus-nb-wmi on recent ASUS laptops registers one hwmon for fan control and another for WMI sensors. Both device symlinks resolve to the same /sys/devices/platform/asus-nb-wmi, so hwmonName produces the same platform_asus_nb_wmi chip label for both, and any sensor file that exists in both nodes (e.g. pwm1_enable) trips:

collected metric "node_hwmon_pwm_enable" { ... chip="platform_asus_nb_wmi" ... } was collected before with the same name and label values

Approach

Update now does two passes:

  • Pass 1: enumerate /sys/class/hwmon/*, compute the device-derived base chip name for each, and count collisions.
  • Pass 2: when a base name is shared, suffix the chip label with the chip's name file content if it disambiguates, otherwise with the hwmonX basename (always unique within a boot). The include/exclude filter is also moved here so user regexes match the label that is actually emitted in the metric.

Entries that already produce a unique chip label are unaffected — no surprise suffixes for users not hitting the collision.

This is closer in spirit to the discussion in #333 (the same class of bug for dual-socket coretemp boxes), but contained: the fix only kicks in when an actual collision is detected.

Test plan

New collector/hwmon_linux_test.go:

  • TestHwmonDuplicateChipNamesAreDisambiguated — reproduces the Metric node_hwmon_pwm_enable was collected before with the same name and label values #3637 ASUS WMI scenario (two hwmon dirs sharing one platform device, both exposing pwm1_enable) and asserts both Gather succeeds and the chip labels are distinct.
  • TestHwmonUniqueChipNamesAreUnchanged — guards against unintended label drift for users not hitting the collision.
  • TestHwmonDuplicateChipNamesWithSameNameFile — exercises the hwmonX-basename fallback when the name file content also collides.
  • Full collector test suite still passes (go test ./collector/), including the existing fixture-driven e2e checks.
  • go vet ./... clean.

Multiple hwmon nodes can be registered under a single parent device
(for example asus-nb-wmi exposes one hwmon for fan control and another
for WMI sensors). Both currently resolve to the same chip label
(`platform_asus_nb_wmi`) and trigger "metric collected before with the
same name and label values" errors at scrape time.

Detect this collision in a first pass and append the chip's `name` file
content (or the hwmonX basename if names also collide) to the chip
label in a second pass. The include/exclude filter is moved into the
same pass so user regexes match the label that is actually emitted.

Fixes: prometheus#3637

Signed-off-by: Matthew Wimpelberg <matt.wimpelberg@grafana.com>
@mwimpelberg28
Copy link
Copy Markdown
Author

@SuperQ would you have a moment to take a look? Happy to address any feedback.

@SuperQ
Copy link
Copy Markdown
Member

SuperQ commented May 15, 2026

This should add to the test fixtures so that it's tested in the end-to-end test.

Comment thread collector/hwmon_linux_test.go Outdated
mwimpelberg28 and others added 2 commits May 15, 2026 06:20
Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Matthew Wimpelberg <120263653+mwimpelberg28@users.noreply.github.com>
Two new hwmon nodes share a single platform device (asus-nb-wmi) with
distinct `name` file contents, exercising the disambiguation path in
the end-to-end test. Without the fix in the previous commit, the
duplicate base chip name `platform_asus_nb_wmi` would have triggered a
registry error before any metrics were scraped.

Also expand the e2e chip-include regex to admit the new chips so
their disambiguated labels appear in the expected output.

Signed-off-by: Matthew Wimpelberg <matt.wimpelberg@grafana.com>
@mwimpelberg28 mwimpelberg28 requested a review from SuperQ May 15, 2026 16:40
@mwimpelberg28
Copy link
Copy Markdown
Author

This should add to the test fixtures so that it's tested in the end-to-end test.

This should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metric node_hwmon_pwm_enable was collected before with the same name and label values

2 participants